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ABSTRACT 

When the highest-way association is present in a 
3-way cross-classification of frequencies, standard logit and 
loglinear models have an many parameters as there are cells in the 
table; that is, the models are "saturated.” Extensions of logit and 
loglinear models are described here that provide more parsimonious 
alternatives to saturated models. The new models, logit 
multiplicative models, and their equivalent log multiplicative 
models, are introduced here for the case where there is one 
dichotomous response or criterion variable and two (polytomous) 
explanatory or predictor variables. In logit multiplicative models, 
the interaction between the explanatory variables is represented by 
the product of scale values for the categories of the explanatory 
variables and a measure of the strength of the association. Plots of 
the scale values provide graphical representations and descriptions 
of the interaction. The new models are illustrated by modeling a 
3~way interaction between whether an elementary school student 
attends an extracurricular tutoring program, the highest educational 
level attained by the student’s father, and the student’s grade 
level. (Contains 1 figure, 3 tables, and 33 references.) (Author) 
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Abstract 



Logit Multiplicative Models: Alternatives to Saturated 
Logit /Loglinear Models for 3-Way Tables 

When the highest-way association is present in a 3-way cross- 
classification of frequencies, standard logit and loglinear models have 
as many parameters as there are cells in the table; that is, the models 
are “saturated”. Extensions of logit and loglinear models are described 
here that provide more parsimonious alternatives to saturated models. 
The new models, logit multiplicative models and their equivalent log 
multiplicative models, are introduced here for the case where there is 
one dichotomous response or criterion variable and two (polytomous) 
explanatory or predictor variables. In logit multiplicative models, the 
interaction between the explanatory variables is represented by the 
product of scale values for the categories of the explanatory variables 
and a measure of the strength of the association. Plots of the scales 
values provide graphical representations and descriptions of the inter- 
action. The new models are illustrated by modeling a 3-way interac- 
tion between whether an elementary school student attends an extra- 
curricular tutoring program, the highest educational level attained by 
the student’s father, and the student's grade level. 



Keywords: loglinear models, logit models. 3- way interactions, latent 
variables, scaling, categorical data analysis. 
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Introduction 



Variables in educational and social science research are often discrete, such 
as gender, race, grade level, types of programs of study, whether a course is 
passed or failed, the highest degree earned by a parent, plans after gradu- 
ation, desired occupation, and actual occupation. Other variables are con- 
tinuous, but are measured descretely, such as socio-economic status, ability, 
and achievement. With respect to ability and/or achievement, the observed 
variables are typically whether a student selects a correct or incorrect answer 
on an objective test item, or the actual response option selected on a multiple 
choice item. 

The standard approach to analyzing multivariate categorical is to use 
loglinear or logit models. These models are extremely useful for identify- 
ing interactions that are present in multivariate categorical data; however, 
they are not as useful for helping to describe the nature of the relationships 
that do exist. Unlike continuous variables where a single number such as a 
correlation coefficient is sufficient to summarize the association between two 
variables, the number of statistics (i.e.. odds ratios) needed to characterize 
the association between categorical variables is an increasing function the 
number of categories of the variables, for example, the minimum number of 
odds ratios that are needed to completely describe the relationship between 
just two variables each of which has 4 levels is ( 4 - 1 )(4 — 1) = 9. When asso- 
ciations exist among three or more variables, the number of statistics needed 
to describe the relationship is larger making the problem of interpreting and 
describing interactions even more difficult. 

In the case of two categorical variables, t he multidimensional row-column 
or *RC(M)" association model developed by Goodman (1979, 1985, 1986, 
1991) is extremely useful for summarizing and describing the relationship be- 
tween two variables (also see Agresti. 1 •»«»«»; Cl,,gg V Shihadeh, 1994; Wick- 
ens, 1989). The RC(M) association model «>» ,ui extension of the loglinear 
model for two-way tables. Interactions are represented in RC(M) models by 
the product of scale values assigned to < .tie«one> of the variables and a mea- 
sure of the strength of the relationship. I * ’* •* -* of -rale values provide graph- 
ical representations of the association l«e»w«vn variables, which greatly aid 
interpretation. Numerous generalizat i«ni« • •) 'he H('(\f) association model 
for three or more variables have been (Anderson, 1996; Becker, 

1989a; Becker & Clogg, 1989; Choulakun. < logg, 1982a, 1982b; Gilula 
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&: Haberman, 1988; Goodman, 1986; Mooijaart, 1991); however, the exist- 
ing model generalizations do not address the situation where one variable 
is a dichotomous response or criterion variable and the other variables are 
explanatory variables. It is this situation that we are concerned with here. 

Loglinear and RC(M) association models represent the relationships be- 
tween categorical variables without making distinctions regarding the role 
that particular variables play in an analysis. When one variable is a response 
or criterion variable and the rest of the variables are explanatory or predictor 
variables, logit models are often preferable to loglinear models, even though 
logit models are equivalent to loglinear models. Logit models are simpler than 
their equivalent log linear models. Logit models only include terms that rep- 
resent the associations between the criterion and the explanatory variables, 
and they do not contain terms that represent the relationship between the 
explanatory variables. Extensions of logit models, logit multiplicative mod- 
els, are presented here that are analogous to the RC(M) association model. 
In the logit multiplicative models, the interactions between the explanatory 
variables are represented by products of scale values multiplied times a mea- 
sure of the strength of the association. The models are equivalent to one of 
the models in the family of association model generalizations proposed by 
Anderson (1996). 

In Section 2, the logit multiplicative model for the specific case of a 
dichotomous response variable and two explanatory variables is described. 
Since fitting the models is a non-trivial problem and cannot be done using 
standard procedures in readily available statistical software packages, two 
methods currently available for estimating the new models are discussed in 
Section 3. As an example, the models are used to analyze data from a study 
by Hsieh (1996) on the effects of extra-curricular tutoring programs on math- 
ematics achievement in elementary school children in Taiwan. 

2 The Logit Multiplicative Model 

In Section 2.1, the logit multiplicative model is presented as an extension 
of a standard logit model, followed in Section 2.2 by a discussion of the 
identification constraints needed to estimate model parameters. Lastly, in 
Section 2.3, the interpretation of the model and its parameters is discussed. 
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The Basic Model 



Let Fijk equal the number of individuals (students, parents, subjects, ob- 
jects, etcetera) who fall into categories i, j, and k of variables A , B , and C, 
respectively, where variables A and B are explanatory variables and variable 
C is a dichotomous response variable (i.e., k = 1,2). The number of cate- 
gories of variable A equals I (i.e., i = 1, . . . /) and the number of categories 
of variable B equals J (i.e., j = 1, . . . , J). 

We start with the standard logit moiel with two explanatory variables 
where the “dependent” variable is the oids of one response versus the other 
(i.e., Fiji/ Fij 2 ). The most complex or saturated model is 

TT “ = 00A(i)0B(j)0AB(ij) (1) 

where (3 is a constant, 0A(i) and 3b{j) are “main” or marginal effect terms 
for variables A and B, respectively, and 4B (u) is the interaction term. Since 
the odds of response 1 versus response 2 is a multiplicative function of the 
model parameters, the logarithm of F i} \/ F i]2 is a linear function of model 
parameters; that is, 



ln( 



Fiji . 



— T + T A(i) + Te(ji) + 



( 2 ) 



where r = ln(/3) is a constant, r A{i) = ln(;?. 4 (i )) and t B(]) = ln(^ B(j) ) are 
“main” effect terms, and r AB (ij) = 1 n {3AB(ij)) Is the interaction term. This 
model has as many unique parameters as there are data points, which in 
this case equals the number of odds that can be formed for the varioi: ; 
combinations of variables A and B (i.e., / x J ). The saturated logit model 
(equations 1 and 2) will always fit the data perfectly. 

To obtain a more parsimonious and simpler representation of the data, we 
note that the interaction terms t ab ^j) in equation 2 (or (3 A B(i } ) in equation 1) 
are unstructured in the sense that they equal whatever they need to equal 
so that the data are fit perfectly. The new models presented here impose a 
multiplicative structure on these terms and break the interaction down into 
component pieces as follows: 



ln( 



fw 

F,2 



M 

) T T4(t) + T ^ 

>n = 1 



(3) 
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where M equals the number of components or dimensions used to represent 
the interaction, fii m and Uj m are scale values for categories i and j of variables 
A and B, respectively, on dimension m, and <j> m is a measure of the strength of 
the interaction between A and B on dimension m. When M = min(/, J) — 1, 
equation 3 is equivalent to the saturated logit model; however, when M < 
min (/, J) — 1, the model is not saturated and provides a summary of the 
interaction. In practice, models such as equation 3, the RC(M) association 
mod d and it’s various generalizations, typically only need a small number of 
dimensions (i.e., 1 or 2) to adequately fit data. 

Assigning numbers to the categories of “ordinal” variables seems natural, 
but what about scaling the categories of “nominal” variables? While no a 
priori ordering cf categories may exist (i.e., a variable is “nominal”), when 
considering the relationship between observed variables, there may be an or- 
dering of the categories on some underlying or latent (continuous) dimension. 
Even when categories have an a priori ordering (i.e., an “ordinal” variable), 
this ordering may not be the appropriate one for describing an interaction be- 
tween variables. Since the parameters of model 3, including the scale values 
and association parameters, are estimated from the data, we can discover the 
appropriate ordering and relative spacing between categories that is needed 
to summaiize the interactions by fitting the model to data. 

2.2 Identification Constraints 

To estimate the parameters of equation 3, identification constraints on the 
model parameters are required. These constraints do not effect the predicted 
or “fitted” values, and thus do not effect how well the model fits the data. The 
identification constraints do effect the actual numerical values of estimated 
parameters. The constraints imposed on the r A (,)’s and T B [j)' s are the same 
as those typically imposed on the analogous terms in logit and loglinear 
models; that is, either a particular term is set equal to a constant (e.g., 
t A( i) = t b( i) = 0), or the sum of the terms for a variable is set equal to a 
constant (e.g., £,- =1 r A(i) = r BU) = 0). 

The sets of scale values for each variable need to be centered and scaled. 
The centering constraints set the location of the scale. The centering con- 



straints used here are 
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yi VjmhBij) = 0 
j=l 


(5) 



where h^i) and are fixed and known weights for categories i and j of 

variables A and B , respectively. Possible choices for h A ^) and h,B(j) include 
unit weights, uniform weights (i.e., h A {{) = 1// and he(j) = 1 /«/), or marginal 
probabilities. Becker and Clogg (1989) discuss the choice of weights for the 
FtC(M) association model, and their results apply here as well. 

The scaling constraints used here are 

i 

imh'A(t) ~ ^mm* ( 6 ) 

i = l 

J 

~ (7) 

where S mrn > = 1 for m = m! and 0 for m ^ m*. These constraints set the unit 
of measurement and constrain the scale to be orthogonal across dimensions. 

In sum, the constraints on the scale values can be thought of as setting 
the me ans of the sets of scale values equal to zero, the variances equal to one, 
and the covariances (or correlations) between scales equal to zero. Thus, the 
scale \alues are an interval level measure on underlying continuous variables. 
Lineal transformations of the scale values will not effect the fitted (predicted) 
values. 

Given the identification constraints, the degrees of freedom for the logit 
multiplicative model in equation 3 can now be computed. The degrees of 
freedom equals the number of data points minus the number of unique pa- 
rameters (i.e., the number of parameters in equation 3 minus the number of 
constraints needed to identify them). 1 hits, the degrees of freedom equal 

df = ( / — M — 1 )( J — A/ — 1 ) 



( 8 ) 



2.3 Interpretation of Model Parameters 

In logit multiplicative models, as well as in standard logit, loglinear, and 
RC(M) association models, interactions between variables are defined in 
terms of odds ratios. A direct relationship exists between odds ratios and 
the model parameters. Since interactions between categorical variables are 
defined in terms of odds ratios (and for three variables, ratios of odds ratios), 
this relationship has implications regarding the proper interpretation of the 
logit multiplicative model. 

Let du'(j) equal the ratio of the odds F, }[ / F l]2 to F,<ji/F t <j 2 , which is an 
odds ratio for variables A and C conditional on category j of variable B\ 
that is, 



If there is no interaction between variables A and B in their relationship to 
variable C (i.e., no 3-way association among variables A, B and C), then 
the conditional odds ratio given category j equals the conditional odds ratio 
for any other category of variable B (i.e.. for all *,*' = 1, . . . , /, 

and j, j' = 1, . . . , J). Alternatively, we can consider odds ratios conditioning 
on the categories of variable A; that is. 



If there is no interaction between variable's A. B and C, then the conditional 
odds ratios for all categories of variable A will all be equal (i.e., 9jj>(i) = Ojj'U') 

for all i,i' — 1, . . . , /, and j,j' = 1, / i. 1 hus, when no 3-way association 

exists, 



If a 3-way interaction does exist, then B . ■ ; l.or equivalently ln(0,-,/ J j< ) ^ 

0 for at least some i, i' = 1, ...,/, and ! J. 

In terms of the parameters of the linjit ".mil plicative model, the logarithm 
of the ratio of conditional odds ratios «n>iuis 




Fiji/ Fjj 2 _ Fiji F,/j 2 



ln(0ii',jj«) — ^ ' 0m (/Am 



( 9 ) 
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The associations in the data that are attributable to the 3-way relationship 
arc represented by the scale values and the association parameter, and not 
by any of the other terms in the model. The scale values provide informa- 
tion about the structure of the interaction between variables A and B in 
their relation with variable C. Since the scale values provide an interval 
level measure on latent continuous variables, only the relative differences be- 
tween scale values for the categories of variable A are meaningful, as well 
as the relative differences between the scale values for categories of variable 
B . Categories with the relatively larger differences between their scale val- 
ues have (conditional) odds ratios that are more dissimilar and thus their 
ratio Qu'jji is further from 1 than do categories with relatively smaller dif- 
ferences. Alternatively, categories with nearly equivalent scale values, have 
nearly equivalent odds ratios, and thus Qa\jj> for these categories will be 
close to 1, the point of no association. 

Plots of scale values provide “pictures” of the relationship among the 
variables. Such “pictures” are graphical representations of all possible ratios 
of odds ratios that can be formed. These plots greatly facilitate the substan- 
tive interpretation and description of interactions in data. The geometry and 
interpretation of such plots is similar to that of plots of scale values from the 
RC{M) association model (see Goodman, 1986; Clogg, 1986), except that 
interactions are defined in terms of ratios of odds ratios rather than just odds 
ratios. In these plots, the relative distances between points provides infor- 
mation about the relationship between the variables. An example of such a 
plot and its interpretation is given in Section 4. 

To gain insight into the meaning and interpretation of <f>, we first examine 
the simple case of the one dimensional model where 

W©»',jj') = ~ ~ vy) 

For a one unit change in the scale for variable A (i.e., (m - m>) = 1) and 
a one unit change in the scale for variable B (i.e., (vj - uy) = 1), 0 is the 
logarithm of the ratio of conditional odds ratios. In other words, the associ- 
ation parameter <f> is a measure of the strength of the relationship between 
variables A, B and C . In the case of two or more dimensions, <j) m measures 
the strength of the relationship on the r/jth latent dimension. 

At times, we may wish to consider the "effect” of one variable holding the 
other variable constant; that is, we may want to consider what happens to 
the odds F tjl /F,j 2 when we change to category i' of variable A. To examine 



such effects, we look at the appropriate conditional odds ratios, which in this 
case is Ou*(j)- In terms of the logit model parameters, this equals 

ln(0»'(j)) — (TA(i) TA(i')) “I” ^ v 0m (/^tm 

m 

Thus, the difference between the odds ratio for category i and that for cat- 
egory i' depends on both the main effect of variable A and the interaction 
effect between variables A and B. Alternatively, if we hold the level of vari- 
able A constant and examine the change in odds ratios between categories j 
and j' of variable B, we find that the change in odds ratios depends on both 
the marginal effect of B and the interaction between A and B\ that is, 

ln(0jj»(,)) — ^ 4 > mfJ'im ( ^im ~ Vj’m) 

m 

Rather than odds and odds ratios, for some purposes it is more convienent 
to use predicted probabilities or relative frequencies. The fitted odds can be 
transformed to probabilities as follows: 



Kijl 



1 

1 

1 -|- e -( T + T A(.)+TB(j)+£ m 



( 10 ) 



where 7 r,j { is the predicted probability of response 1 given categories i and 
j of variables A and B, respectively. However, with respect to interpreting 
the model parameters, the interpretation using equation 10 is not as simple, 
straight forward, or direct as it is when we discuss odds and odds ratios. 



3 Estimation 

While the model can be estimated by least squares, only maximum likeli- 
hood estimation under the standard sampling assumptions of independent, 
homogeneous observations from either a Binomial or Poisson distribution is 
discussed here. The same inherent difficulties encountered when estimat- 
ing che RC(M) association model apply to estimating logit multiplicative 
models (see Haberman (1995) for a discussion of the difficulties involved in 
estimating the RC{M ) association model). Two currently available methods 



of estimating the models are briefly outlined here. One method uses com- 
mon statistical packages, but this method requires specially written modules 
and can only be used to estimate a one dimensional model (i.e., M = 1). 
This method is described in Section 3.1. The other method makes use of the 
equivalence between the logit multiplicative model and a special case of one 
model from the family of models proposed by Anderson (1996). This family 
of models, 3-mode association models, are generalizations of the RC(M) as- 
sociation model to 3-way tables. This latter method makes use of a program 
written to estimate the entire family of 3-mode associations models devel- 
oped by Anderson (1996). In Section 3.2, the equivalence between the logit 
multiplicative model and the 3-mode association model is given, as well as 
some general comments about the algorithm used in the program. 

3.1 Uni-Dimensional Model 

Uni-dimensional model can be estimated using software that estimates gen- 
eralized linear models (Dobson, 1990; McCullagh & Nelder, 1990), such as 
GLIM (Francis, Green & Payne, 1993) or SAS/GENMOD (SAS Inc., 1994). 
Generalized linear models are extensions of traditional linear models that 
have two basic parts: a structural component and a random component. 
The structural component is a linear function of the predictor or explana- 
tory variables. The random component is a probability distribution for the 
response variable, which can be any distribution from an exponential family 
of distributions. The “link function” describes how the mean of the response 
variable is related to the linear predictor. The procedure used to estimate 
generalized linear models can be used to fit the one dimensional logit multi- 
plicative model. 

Both the one dimensional RC model and the logit model are generalized 
bilinear models (as opposed to generalized linear models). The RC model 
has a log link function and its random component is the Poisson distribution, 
while the logit bilinear model has a logit link function and its random com- 
ponent is the Binomial distribution. By changing the link function and the 
distribution, the algorithm given by Becker (1989b) for estimating the one 
dimensional RC association model using generalized linear models can be 
modified to fit the one dimensional logit model (or any generalized bilinear 
model). 

The procedure to estimate logit bilinear models is iterative and requires 
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starting values for the scales values. To describe the basic steps required for 
the iterative portion of the procedure, we write equation 3 as 

ln(F tjl /F tj2 ) = t + r A(i) + r B(j) + (11) 

where (f>fi{Vj = x;yj. Variables A and B are declared to be classification 
variables and x and y are alternately treated as numerical variables and pa- 
rameters that are to be estimated. In one step, equation ' . is fit to data with 
x,-’s treated as the values of a numerical variable and the j/j’s are estimated 
parameters. On the next step, the new estimates of j/j are treated as the 
values of a numerical variable and the Xj's are estimated parameters. This 
process is repeated until the change between fitted values on successive cycles 
is less than some specified criterion (i.e., a very small number). After the so- 
lution has converged, the identification constraints are imposed on the scales 
values (i.e., jx{ m = a&i + b, and z>j m = cy ; + d, where a, 6, c and d are constants 
such that £« idih A (i) = Ej = 0 and E = Ej ^B(j) = 1)- The 

model is estimated a final time using the product as a numerical variable 
to obtain an estimate of and the final estimates of the other parameters in 
the model. The fit statistics for the final model are correct, but the degrees 
of freedom and estimated standard errors of the model parameters given by 
the program are not correct. The correct degrees of freedom are given in 
equation 8. 

Since the procedure requires iteratively fitting models, in practice, mod- 
ules written to perform the cycles and steps within each cycle are used. 
Becker (1989b) describes one such module for the program GLIM. A module 
using SAS/GENMOD is available from the author. One advantage of this 
procedure is that it uses existing and generally available software. Another 
advantage is that the model statement can be readily modified to fit more 
complex bilinear models. For example, more variables can be included and 
additional bilinear terms for other 2-wav interactions can be estimated (e.g., 
Anderson & Wasserman, 1995). The major disadvantage of this method is 
that it cannot be used to estimate multiple dimensions, which is why we turn 
to a second method of estimation. 



12 




JLJ 



3.2 Multi-Dimensional Mode) 



The estimation procedure described here can be used to estimate both uni- 
and multidimensional models. Equation 3 is a special case of one class 
of 3-mode association models proposed by Anderson (1996). Three-mode 
association models, which are log multiplicative models, are extensions of 



the saturated loglinear model for 3-way tables, as well as generalizations 
of the RC(M) association model to 3-way tables. The 3-mode association 
model that is equivalent to the logit multiplicative model is 



where the terms A, A^,), ^c(k)< and Agc(jk) are marginal 

effect terms for the various margins of the table; Uj S and are scale 
values for variables A , B and C, respectively, on dimensions r, s, and t, 
respectively; and (j> rst is the association parameter measuring the strength 
of the relationship among dimensions r. * and t. Given the identification 
constraints for the 3-mode association model (Anderson, 1996), if variable 
C only has 2 levels (i.e., k = 1,2), t lien /' = 1. 11 = 5, r/n = — r / 2 1 = l/\/2, 
and (p r , i = 0 for r ^ s, such that equation 12 reduces to 

ln(ffijfc) = A + A^4( t ) + A B(j) + Ar ( *.| + A aB(,j) + ^AC(ik) + 



This log multiplicative model is equivalent to the logit multiplicative model 
given in equation 3. This equvialenco b«‘iomes readily apparent when we 
express ln(F,yi/F,j 2 ) in terms of the parameters of equation 13: 



ln(F;jfc) — A + Ayip) + A B(j) + Ac(fc) + A AB[tj) + ^AC(ik) + 

R S T 

^BC(jk) "b OrstHtrlSjaflkt 



( 12 ) 
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^BC(jk) + X 




(13) 



r = l 



ln(Fjji/Fij 2 ) — (Ac(j) — Ac( 2 | I - • ^ v i.n - A .ac(» 2 ) ) 
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The correspondence between parameters in equations 3 and 13 is 
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= (^C(l) - 


^C(2)j 


r A(i) 


= (^AC(il) 


_ ^AC{i2)) 


T B(J) 


= ('Won 


~ ^BC(j2) ) 




= %/2<?!>rrl 




f^im 


= Mir 




Vjm 


= Vjr 





Since the logit and log multiplicative models are equivalent, a program that 
fits equation 12 (and thus equation 13) can be used to fit the multidimensional 
logit multiplicative model. 

Anderson (1993, 1996) gives maximum likelihood equations and an algo- 
rithm using univariate Netwon-Raphson procedure to fit 3-mode association 
models, including model 12. The FORTRAN program, Smode, implementing 
this algorithm is available from the author. The major disadvantage of this 
method is that a global solution is not guaranteed (Anderson, 1996; Haber- 
inan, 1995); however, in practice, the method works well. Furthermore, to 
ensure convergence, the program can be run iteratively with different starting 
values. 



4 Example 

As an example, logit multiplicative models are fit to data from a study by 
Hsieh (1996) on the effects of extra-curricular tutoring programs on mathe- 
matics achievement test scores of elementary school children in Taiwan. High 
achievement test scores are critical for children to gain access to higher ed- 
ucational and thus occupational opportunities. One question in this study 
are what are potential factors or determinants of whether a student attends 
an extra-curricular tutoring program. These tutoring programing, known as 
“cramming schools”, purport to increase students’ achievement test scores. 
The data used here, given in Table 1, consist of frequencies of children cross- 
classified by their grade level (3 r< ^, 4^, 5^, or 6^“), the highest education 
level attained by their father (less than a sixth grade education, graduated 
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from elementary school, graduated from junior high, graduated from high 
school, graduated from college, or attended graduate or professional school), 
and whether the student attends a cramming school in mathematics. The 
goal of this analysis is to determine whether grade level and/or father’s edu- 
cation are related to whether a student attends a cramming school. Whether 
a student attends a cramming school is the criterion variable, and grade level 
and father’s education are the explanatory or predictor variables. 

Logit models for various numbers of dimensions were fit to the data in 
Table 1. The fit statistics for these models are reported in Table 2, where 
G 2 is the likelihood ratio statistic and A 2 is Pearson’s chi-square statictic. 
The first column of Table 2 indicates how many dimensions were fit. The 
first model with M = 0, which is the additive effects logit model (i.e., the 
no 3-factor interaction loglinear model), does not fit the data. A 3-way 
interaction among grade level, father’s education, and whether a student 
attends cramming school exists Rather than having to settle for the saturated 
logit model (the last model in the table, M = 3), we have two intermediate 
models (M = 1 and M = 2), both of which appear to fit based on the global 
fit statistics. 

The (standardized) residuals and estimated parameters for both of the 
one and two dimensional models were examined. The fitted values from 
the one dimensional model are reported in Table 1. Comparing these to 
the observed frequencies, there is unusually large residual for fourth graders 
whose father graduated from junior high. The estimated scale values and 
association parameters for the two dimensional model reveal that the second 
dimension essentially accounts for the cell for fourth graders whose father’s 
completed junior high. We selected the one dimensional model as the better 
of the two models partially on the basis of parsimony and partially on the 
basis of its interpretation, which is given below. 

To describe the interaction between grade level and father’s education 
with respect to whether a student attends cramming school, we examine the 
estimated scale values of the one dimensional model. The estimates of the 
model parameters are reported in Table 3 and the estimated scale values are 
plotted in Figure 1. In estimating these parameters, zero sum constraints 
were imposed on the main (marginal) effect terms and unit weights (i.e., 
ho(i) — ^F(j) = 1) were used for the scale values. 

In Figure 1 (or Table 3), we see that the scale values for student’s whose fa- 
ther have had some education are nearly equivalent and that the scale values 
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Table 1: Frequencies (first row) and fitted values (second row) from the 
logit(l) multiplicative model cross-classified by whether a student attends 
cramming school, student’s grade level, and father's education level. 



Cramming 

School 


Grade 

Level 


None 

0 


h cither’s Education Level 
Junior Senior 

Primary High High College 

1 2 3 4 


Post- 

Grad. 

5 




3 r d 


3 


1 


3 


13 


35 


4 






3.00 


0.54 


3.94 


14.65 


33.79 


3.08 


Yes 


4 th 


1 


2 


9 


19 


27 


4 






0.99 


1.88 


4.83 


17.75 


29.77 


6.78 




5 th 


0 


4 


12 


35 


44 


10 






0.01 


4.59 


15.30 


34.59 


42.41 


8.10 




6 th 


0 


19 


25 


64 


51 


5 






0.00 


18.99 


24.93 


64.01 


51.03 


5.04 




3 rd 


2 


2 


8 


29 


76 


4 






2.00 


2.46 


7.06 


27.35 


77.21 


4.92 


No 


4 th 


1 


8 


4 


30 


67 


13 






1.01 


8.12 


8.17 


31.25 


64.23 


10.22 




^th 


3 


16 


24 


48 


72 


8 






2.99 


15.41 


20.70 


48.41 


73.59 


9.90 




6 th 


4 


13 


22 


47 


78 


7 






4.00 


13.01 


22.07 


46.99 


77.97 


6.96 



Hi 



O 

ERIC 



17 



[ r 

- 1.0 





.50 



Some 



1 



0 



.50 



Father s Education: 0j 



-thgrd 4 th 



-r 

0 



-.50 



.50 



Grade Level: fo 



None 



1.0 



1.0 



Figure 1: Estimated scales values from the logit(l) multiplicative model fit 
to the odds of attending cramming school cross-classified by grade level and 
father’s education. 
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Table 2: Fit Statistics for logit multiplicative models for various numbers of 
dimensions. 

Model ( M ) df G 2 p - value .Y 2 p - value 



0 


15 


29.33 


.01 


27.71 


.02 


1 


8 


11.91 


.16 


12.03 


.15 


2 


3 


.75 


.86 


.87 


.93 


3 


0 


.00 


1.00 


.00 


1.00 



for student’s in the third, fourth and fifth grades are also nearly equivalent. 
This implies that odds ratios for students in grades 3 through 5 and whose fa- 
ther have had some education are nearly ('qui valent (i.e., Qii',jj‘ ~ 1). There 
is a relatively large -distance from the scale values for “None” and “Some” 
and between the scale values for grades 3 through 5 and grade 6. The ob- 
served pattern of scale values indicate* that the odds that children in the 
3rd, 4th and 5th grades attend a cramming «vhool are greater than the odds 
for children in the 6th grade when their fathers have no education versus 
when their father has had some education. The odds that children in the 6th 
grade attend a cramming school are greater than those for children in the 
younger grades given that their parents had more than an elementary school 
education. Overall (except for children whose fathers have the lowest level 
of education, the odds (and in this canf. probability) that children attend 
tutoring program are larger when they're in t he sixth grade versus one of the 
other grades. 

We should note that the estimated association parameter d> = 220.62 is 
extremely large. Given that the different e between scale values for “Some” 
and “None” is approximately equal to I .md that the difference between 
the scale values for sixth graders and < Inldien in the other grades is a little 
larger than 1, the odds ratio for sixth binders whose father has had some 
education versus no education is more t l..m e\pt 220.62) (a very large numer) 
times larger than the corresponding odd* Minis for childern in the other 
grades. Alternatively, the odds ratio fur > ' . dreti whose father has had some 
education and who are in the sixth ,.f the other grades is more 

than exp( 220.62) times larger than the < "i ‘"tiding odds ratios for children 

1 ' 
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Table 3: Parameter Estimates from the logit bilinear model (i.e. M = 1). 



Variable 


Category 


Marginal Effect 


Scale Value 


— 


— 


f = 


-12.3351 


4> = 


220.6313 


Grade Level 


3 7d 


*6(1) = 


11.7323 


At = 


.2984 




4 th 


*6(2) = 


11.7096 


P'2 = 


.2964 




5 th 


*G( 3) = 


10.9070 


P3 = 


.2709 




6 th 


*6(4) = 


-34.3490 


A-i = 


-.8658 


Father’s Education 


None 


*F(1) = 


-59.1020 


0 1 = 


.9128 




1 


*F ( 2) = 


11.3814 


l>2 = 


-.1867 




2 


*F(3) = 


12.0125 


= 


-.1821 




3 


*F( 4) = 


12.0309 


L> 4 = 


-.1830 




4 


*F(5) = 


11.6925 


h = 


-.1809 




5 


*F(6) = 


11.9845 


h = 


-.1799 



whose fathers have had no education. 

In this data set, the interaction between father’s education and grade level 
is due to the sixth graders and the student’s whose father had no formal 
education. The additive logit model fit to all the data except the cell for 
the sixth graders whose father had no education does not adequately fit the 
data (i.e., G 2 = 24.53, df = 14, p— value= .03) 1 . However, if we delete 
the sixth graders and those students whose fathers had no education, we 
find that the additive logit model with just marginal effects for grade and 
father’s education (i.e., the model without an interaction between grade level 
and father’s education) fits the data quite well ( G 2 = 11.80, df = 8, p- 
value= .16) 2 . Furthermore, the additive logit model fit to data with just 
children in the sixth grade deleted provides an adequate fit ( G 2 = 16.38, 
df = 10, p-value= .09), and the model fit to the data with just children 



'Fitting this model makes use of methodology for incomplete tables. The odds for 
sixth graders whose father had no education w;is lit perfectly, which uses up one degree of 
freedom relative to the logit model with M = I) 

2 This result was an additional reason for selecting the model with one versus two 
dimensions 
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whose father had ; education also fits adequately ( G 2 = 20.08, df = 12, 
p-value= .07). While the nature of the interaction in this example is a rather 
simple one, it illustrates the power of logit multiplicative models with respect 
to identifying where the interaction is and its ability to give a parsimonious 
representation of the data. 

5 Discussion 

The logit model extension proposed here provides a means not just of test- 
ing whether there is a relationship between discretely measured variables, 
but it provides a metric description and representation of the interactions. 
While the specific logit model described here was designed for the case of 
one dichotomous response or criterion variable and two explanatory vari- 
ables, extensions of this model to polytomous responses and/or more pre- 
dictor variables in a similar fashion that the RC(M ) association model has 
been extended to higher-way tables is straight forward (see especially Becker 

6 Clogg, 1989; Clogg & Shihadeh, 1994). 

The new logit model and related models such as the RC{M) associa- 
tion model and its generalizations are very powerful tools for representing 
and describing associations in cross-classifications. Such models have been 
primarily (and successfully) used in sociology (e.g., Clogg, 1982b; Clogg, 
Eliason & Wahl, 1990; Faust & Wasserman, 1993; Yamaguchi, 1987; Xie, 
1992). Clogg (1982b) gives just a sample of the potential applications of 
these models. Examples of their use in educational research are surprisingly 
rare, especially given that variables in educational research are often mea- 
sured discretely (e.g., see first paragraph of this paper). These models can 
be used in observational studies such as the one described in this paper or 
in qualitative studies where behaviors are observed and coded according to 
some defined scheme (e.g., Anderson & Kramer, 1996). 

Due to the latent (continuous) variable interpretations of the models (e.g., 
Bartholomew, 1980, 1987; Goodman, 19S1, 1985; Lauritzen and Wermuth, 
1989; Whittaker, 1989), the models have potential applications in the area 
of educational measurement where concern is focused on the measurement 
of underlying abilities. The resemblance of equation 10 to an item response 
model is not by coincidence. There are very close relationships between 
models for categorical data and more commonly (and some not so commonly) 



known latent variable models (e.g., see Agresti, 1995; Anderson, 1986; Clogg. 
1982b). 

Logit multiplicative models, as well as the RC(M) association model and 
its various generalizations, are relatively recent developments in the method- 
ology for categorical data analysis. Given the wide range of potential applica- 
tions in educational research for such models, we anticipate that researchers 
will find these models valuable tools. 
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